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Abstract 

We consider a trader who aims to liquidate a large position in the presence of an arbitrageur who hopes 
<^ , to profit from the trader's activity. The arbitrageur is uncertain about the trader's position and learns 

from observed price fluctuations. This is a dynamic game with asymmetric information. We present an 
, algorithm for computing perfect Bayesian equilibrium behavior and conduct numerical experiments. Our 

' results demonstrate that the trader's strategy differs significantly from one that would be optimal in the 

absence of the arbitrageur. In particular, the trader must balance the conflicting desires of minimizing 
price impact and minimizing information that is signaled through trading. Accounting for information 
signaling and the presence of strategic adversaries can greatly reduce execution costs. 



1 . Introduction 

When buying or selling securities, value is lost through execution costs such as exchange fees, commissions, 
bid-ask spreads, and price impact. The latter can be dramatic and typically dominates other sources of 
execution cost when trading large blocks, when the security is thinly traded, or when there is an urgent 
demand for liquidity. Execution algorithms aim to reduce price impact by partitioning the quantity to be 
traded and placing trades sequentially. Growing recognition for the importance of execution has fueled an 
academic literature on the topic as well as the formation of specialized groups at investment banks and other 
organizations to offer execution services. 
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Optimal execution a lgorithms have been developed for a number of models. In the base model of 



Bertsimas and Lol (119981) . a stock price nominally follows a discrete-time random walk and the market 
impact of a trade is permanent and linear in trade size. The authors establish that expected cost is min- 
imized by an equipartitioning policy. This policy trades equal amounts over time increments within the 
trading horizon. Further developments have led to optimal execution algorithms for models that incor- 



porat e 



2005 



price predictions (IBertsimas and Lo 



Alfonsi et al 



aversion ( 



19981) . bid-ask spreads and resihence (lObizhaeva and WangL 



2007 al). nonlin e ar pri c e impact models (lAln 



Subramanian and Jarrowl 



.2001 
HorJ. 



Alrngren and Chriss 



gren 



200C; 



2003; 



Alfonsi et al 



Dubil. 



2002 



2007b). and risk 



Huberman and Stanzll 



2005 



Engle and Ferstenberg 



2006 



2006 



Almgren and Lorenz 



2006 



Schied and Schonenbom 



20071 : 



LorenzL 



20081) . 

The aforementioned results offer insight into how one should partition a block and sequence trades 
under various assumptions about market dynamics and objectives. The resulting algorithms, however, are 
unrealistic in that they exhibit predictable behavior. Such predictable behavior allows strategic adversaries, 
which we call arbitrageurs, to "front-run" trades and profit at the expense of increased execution cost. For 
example, consider liquidating a large block by an equipartitioning policy which sells an equal amount during 
each minute of a trading day. Trades early in the day generate abnormal price movements, allowing an 
observing arbitrageur to anticipate further liquidation. If the arbitrageur sells short and closes his position at 
the end of the day, he profits from expected price decreases. The arbitrageur's actions amplify price impact 
and therefore increase execution costs. 

Several recent papers stu d y garne-theoretic rnodels of execution in the presenc e of str ategic arbitrageurs 



(IBruimermeier and Pedersen 



20051 : 



Carhn et al 



Schonenbom and Schied . 



20071 )- However, these 



20071 : 

models involve games with symmetric information, in which arbitrageurs know the position to be liquidated. 
In more realistic scenarios, this information would be the private knowledge of the trader, and the arbitrageurs 
would make inferences as to the trader's position based on observed market activity. 

This type of information asymmetry is central to effective execution. The fact that his position is unknown 
to others allows the trader to greatly reduce execution costs. But to do so requires the deliberate management 
of "information leakage", or the signals that are transmitted via trading activity. Further, the desire to 
minimize information signaling may be at odds with the desire to minimize price impact. A model through 
which such signaling can be studied must account for uncertainty among arbitrageurs and their ability to learn 
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from observed price fluctuations. In this paper we formulate and study a simple model which we beUeve to 
be the first that meets this requirement. 

The contributions of this paper are as follows: 

1. We formulate the optimal execution problem as a dynamic game with asymmetric information. This 
game involves a trader and a single arbitrageur Both agents are risk neutral, and market dynamics 
evolve according to a linear permanent price impact model. The trader seeks to liquidate his position 
in a finite time horizon. The arbitrageur attempts to infer the position of the trader by observing market 
price movements, and seeks to exploit this information for profit. 

2. We develop an algorithm that computes perfect Bayesian equilibrium behavior. 

3. We demonstrate that the associated equilibrium strategies take on a simple structure: Trades placed by 
the trader are linear in the trader's position, the arbitrageur's position and the arbitrageur's expectation 
of the trader's position. Trades placed by the arbitrageur are linear in the arbitrageur's position and his 
expectation of the trader's position. Equilibrium policies depend on the time horizon and a parameter 
that we call the "relative volume". This parameter captures the magnitude of the per-period activity of 
the trader relative to the exogenous fluctuations of the market. 

4. We present computational results that make several points about perfect Bayesian equilibrium in our 
model: 

(a) In the presence of adversaries, there are significant potential benefits to employing perfect 
Bayesian equihbrium strategies. 

(b) UnUke strategies proposed based on prior models in the literature, which exhibit determinis- 
tic sequences of trades, trades in a perfect Bayesian equihbrium adaptively respond to price 
fluctuations; the trader leverages these random outcomes to conceal his activity. 

(c) When the relative volume of the trader's activity is low, in equilibrium, the trader can ignore the 
presence of the arbitrageur and will equipartition to minimize price impact. Alternatively, when 
the relative volume is high, the trader will concentrate his trading activity in a short time interval 
so as to minimize signaling. 

(d) The presence of the arbitrageur leads to a spill-over effect. That is, the trader's expected loss due 
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to the arbitrageur's presence is larger than the expected profit of the arbitrageur. Hence, other 
market participants benefit from the arbitrageur's activity. 

5. We discuss how the basic model presented can be can be extended to incorporate a number of additional 
features, such as transient price impact and risk aversion. 

Solving for perfect Bayesian equilibrium in dynamic games with asymmetric information is notoriously 
difficult. What facilitates effective computation in our model is that, in equilibrium, each agent solves a 
tractable linear-quadratic Gaussian control problem. Similar approaches based on linear-quadratic Gaussian 
control have previously been used to analyze eq uilibrium beh avior of traders with private information. This 



Une of work begins with the seminal paper o f 



Foster and Viswanathan . 



1994 



19961 : 



Vayanos , 



Kvld (|l985r) . and includes many subsequent papers (e.g.. 



20011) . Among these contributions. 



Foster and Viswanathan 



19941) come closest to the model and method we propose. In the model of that paper, there are two strategic 
traders, many "noise" traders, and a market maker. The strategic traders possess information that is not 
initially reflected in market prices. One trader knows more than the other. The more informed trader adapts 
trades to maximize his expected payoff, and this entails controlling how his private information is revealed 
through price fluctuations. This model parallels ours if we think of the arbitrageur as the less informed trader. 
However, in our model there is no private information about future dividends but instead unc ertainty about the 



size of the position to be liquidated. Further, in the model o f Foster and ViswanathanI (|l994r) . trades influence 



prices because the market maker tries to infer the traders' private information whereas, in our setting, there 
is an exogenously specified pri ce impact model. The algorithm we develop bears some similarity to that of 



Foster and ViswanathanI (|l994l '). but requires new features designed to address differences in our model. 

The remainder of this paper is organized as follows. The next section presents our problem formulation. 
Section[3]discusses how perfect Bayesian equilibrium in this model is characterized by a dynamic program. 
A practical algorithm for computing perfect Bayesian equilibrium behavior is developed in Section HI This 
algorithm is applied in computational studies, for which results are presented and interpreted in Section [5] 
Several extensions of this model are discussed in Section [6l Finally, Section |7] makes some closing remarks 
and suggests directions for future work. Proofs of all theoretical results are presented in the appendices. 
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2. Problem Formulation 



In this section, the optimal execution problem is formulated as a game of asymmetric information. Our 
formulation makes a number of simplifying assumptions and we omit several factors that are important in 
the practical implementation of execution strategies, for example, transient price impact and risk aversion. 
Our goal here is to highlight the strategic and informational aspects of execution in a streamlined fashion. 
However, these assumptions are discussed in more detail and a number of extensions of this basic model are 
presented in Section |6l 

2.1. Game Structure 

Consider a game that evolves over a finite horizon in discrete time steps t = 0,...,T + l. There are two 
players: a trader and an arbitrageur. The trader begins with a position xq G M in a stock, which he must 
liquidate by time T. Denote his position at each time thy xt, and thus require that xt = for t > T. The 
arbitrageur begins with a position uq. Denote his position at each time t by yt. In general, the arbitrageur 
has additional flexibility and will not be limited to the same time horizon as the trader. For simplicity, this 
flexibility is modelled by assuming that the arbitrageur has one additional period of trading activity. In other 
words, though we do require that yr+i = 0, we do not require that yx = 0. This assumption will be revisited 
in Section l6!T] 

2.2. Price Dynamics 

Denote the price of the stock at time t by pt. This price evolves according to the permanent linear price 
impact model given by 

(1) Pt = Pt-i + = pt-i + X{ut + vt) + et- 

Here, A > is a parameter that reflects the sensitivity of prices to ti^ade size, and ut and vt are, respectively, 
the quantities of stock purchased by the trader and the arbitrageur at time t. Note that, given the horizon of 
the trader, ut+i — 0. The positions evolve according to 

xt = xt-i + Ut, and yt = yt-i + vt- 
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The sequence {e^ } is a normally distributed IID process with ~ A^(0, a^), for some 0"^ > 0. This noise 
sequence represents the random and exogenous fluctuations of market prices. We assume that the trading 
decisions ut and vt are made at time t — 1, and executed at the price pt at time t. Note that there is no drift 
term in the price evolution equation ([T]). In the intraday horizon of typical optimal execution problems, this is 
usually a reasonable assumption. This assumption will be revisited in Section 1631 Further, the price impact 
in ([T|l is permanent in the sense that it is long-lived relative to the length of the time horizon T. In Section [63] 
we will allow for transient price impact as well, imp 

2.3. Information Structure 

The information structure of the game is as follows. The dynamics of the game (in particular, the parameters 
A and a^) and the time horizon T are mutually known. From the perspective of the arbitrageur, the initial 
position xq of the trader is unknown. Further, the trader's actions ut are not directly observed. However, 
the arbitrageur begins with a prior distribution 0o on the trader's initial position xq. As the game evolves 
over time, the arbitrageur observes the price change Apt at each time t. The arbitrageur updates his beliefs 
based on these price movements, at any time t maintaining a posterior distribution (pt of the trader's current 
position xt, based on his observation of the history of the game up to and including time t. 

From the trader's perspective, it is assumed that everything is known. This is motivated by the fact that 
the arbitrageur's initial position yo will typically be zero and the trader can go through the same inference 
process as the arbitrageur to arrive at the prior distribution (pQ. Given a prescribed policy for the arbitrageur 
(for example, in equilibrium), the trader can subsequently reconstruct the arbitrageur's positions and beliefs 
over time, given the public observations of market price movements. We do make the assumption, however, 
that any deviations on the part of the arbitrageur from his prescribed policy will not mislead the trader. In 
our context, this assumption is important for tractability. We discuss the situation where this assumption is 
relaxed, and the trader does not have perfect knowledge of the arbitrageur's positions and beliefs, in Section|7] 

2.4. Policies 

The trader's purchases are governed by a policy, which is a sequence of functions vr = {vri, . . . , ttt}- Each 
function tt^+i maps xt, yt, and (pt, to a decision ut+i at time t. Similarly, the arbitrageur follows a policy 
ijj = {tpi, . . . , -i/r+i}- Each function V'f+i maps yt and (pt to a decision vt+i made at time t. Since policies 
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for the trader and arbitrageur must result in liquidation, we require that TTT{xT-i,yT-i, fpT-i) = —xt-i 
and TpT+i{yT, 4>t) = —yr- Denote the set of trader policies by IT and the set of arbitrageur policies by 

Note that implicit in the above description is the restriction to policies that are Markovian in the following 
sense: the state of the game at time t is summarized for the trader and arbitrageur by the tuples {xt,yt, (pt) 
and {yt, <j)t), respectively, and each player's action is only a function of his state. Further, the policies are 
pure strategies in the sense that, as a function of the player's state, the actions are deterministic. In general, 
one may wish to consider policies which determine actions as a function of the entire history of the game up 
to a given time, and allow randomization over the choice of action. Our assumptions will exclude equilibria 
from this more general class. However, it will be the case that for the equilibria that we do find, arbitrary 
deviations that are history dependent and/or randomized will not be profitable. 

If the arbitrageur applies an action vt and assumes the trader uses a policy tt € 11, then upon observation 
of Api at time t, the arbitrageur's beliefs are updated in a Bayesian fashion according to 

(2) (t)t{S) = Pr (xt G 5 I 4)1-1, yt-iA{T^t{xt~i,yt^i,4>t-i) + vt) + et = Apt), 

for all measurable sets C M. Note that Apt here is an observed numerical value which could have resulted 
from a trader action ut 7^ TTt{xt-i,yt-i, 4>t-i)- As such, the trader is capable of misleading the arbitrageur 
to distort his posterior distribution (f)t. 

2.5. Objectives 

Assume that both the trader and the arbitrageur are risk neutral and seek to maximize their expected profits 
(this assumption will be revisited in Section ld!2l) . Profit is computed according to the change of book value, 
which is the sum of a player's cash position and asset position, valued at the prevailing market price. Hence, 
the profits generated by the trader and arbitrageur between time t and time t + 1 are, respectively, 

Pt+ixt+i - pt+iut+i - ptxt = Apt+ixt, and pt+wt+i - Pt+iVt+i - Ptyt = Apt+iyt- 
If the trader uses policy vr and the arbitrageur uses policy ip and assumes the trader uses policy vr, the 
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trader expects profits 



T-l 



T=t 



over times r = t + 1, . . . , T. Here, tiie superscripts indicate that trades are executed based on vr and ip, while 
beliefs are updated based on vr. Similarly, the arbitrageur expects profits 



J2 ^Pr+lVr 



T=t 



yt, 4>t 



over times r = t + l,...,T + l. Here, the conditioning in the expectation implicitly assumes that xt is 
distributed according to 4>t- 

Note that —U^'^'^'^\xq, yo, (po) is the trader's expected execution cost. For practical choices of tt, Tp, 
and 71", we expect this quantity to be positive since the trader is likely to sell his shares for less than the initial 
price. To compress notation, for any vr, ip, and t, let 



U^^^Au-XM^ and y^^.- A y^CV-,-),-^ 



2.6. Equilibrium Concept 



199 ih . This is a 



As a solution concept, we consider perfect Bayesian equilibrium (iFudenberg and Tirolel 
refinement of Nash equilibrium that rules out implausible outcomes by requiring subgame perfection and 
consistency with Bayesian belief updates. In particular, a policy vr G 11 is a best response to vr) E ^ x 11 
if 



(3) 



U^'^'^'''\xt,yt,4>t) = max {xt,yt, (pt) 



en 



for all t, Xt, yt, and (pt. Similarly, a policy € \E' is a best response to vr £ 11 if 



(4) 



^/'''(yt.'At) = max y/'''(yt,0i), 



for all t, yt, and (pt- We define perfect Bayesian equilibrium, specialized to our context, as follows: 
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Definition 1. A perfect Bayesian equilibrium (PBE) is a pair of policies (vr*, V'*) G 11 x such that: 

1. vr* is a best response to {t/j* ,7r*); 

2. V* is a best response to vr*. 

In a PBE, each player's action at time t depends on positions xt and/or yt and the belief distribution (pf 
These arguments, especially the distribution, make computation and representation of a PBE challenging. We 
will settle for a more modest goal. We compute policy actions only for cases where (pt is Gaussian. When the 
initial distribution (^q is Gaussian and players employ these PBE policies, we require that subsequent belief 
distributions (j)t determined by B ayes' rule @ also be Gaussian. As such, computation of PBE poUcies over 
the restricted domain of Gaussian distributions is sufficient to characterize equilibrium behavior given any 
initial conditions involving a Gaussian prior. To formalize our approach, we now define a solution concept. 

Definition 2. A policy vr € 11 (or ^/i S ^) is a Gaussian best response to {tp, vr) € x IT (or vr € 11) 
if © (or (01)) holds for all t, xt, yt, and Gaussian (j)t. A Gaussian perfect Bayesian equilibrium is a pair 
(vr*, V'*) G n X vj/ of policies such that 

1. TT* is a Gaussian best response to (V'*, tt*); 

2. Tp* is a Gaussian best response to vr*; 

3. if (po is Gaussian and arbitrageur assumes the trader uses vr* then, independent of the true actions of 
the trader, the beliefs 01, ... , (pr-i are Gaussian. 

Note that when Gaussian PBE policies are used and the prior (pQ is Gaussian, the system behavior is 
indistinguishable from that of a PBE since the policies produce actions that concur with PBE policies at all 
states that are visited. 

Given a beUef distribution (j)t, define the quantities 



Since A and are constants, pt is simply a scaled version of the standard deviation at. The ratio X/a^ acts 
as a normalizing constant that accounts for the informativeness of observations. The reason we consider this 
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scaling is that it highlights certain invariants across problem instances. In Section 15.21 we will interpret the 
value of pq as the relative volume of the trader's activity in the marketplace. For the moment, it is sufficient 
to observe that if the distribution (pt is Gaussian, it is characterized by {fit, Pt)- 

3. Dynamic Programming Analysis 

In this section, we develop abstract dynamic programming algorithms for computing PBE and Gaussian PBE. 
We also discuss structural properties of associated value functions. The dynamic programming recursion 
relies on the computation of equilibria for single-stage games, and we also discuss the existence of such 
equilibria. The algorithms of this section are not implementable, but their treatment motivates the design of 
a practical algorithm that will be presented in the next section. 

3.1. Stage- Wise Decompo sition 

The process of computing a PBE and the corresponding value functions can be decomposed into a series of 
single-stage equilibrium problems via a dynamic programming backward recursion. We begin by defining 
some notation. For each nt, tpt, and ut, define a dynamic programming operator F^f*''^*'* by 

(Fif'**)?7)(xt_i,yt_i, ) ^ E(^^^''''^[X{ut + vt)xt-i + U{xt,yt,(l)t) I xt-i,yt-i,ct>t-i], 

for all functions U, where xt = xt-i + ut, yt = yt-i + vt, vt = iptiyt-i, 4't-i), and (pt results from the 
Bayesian update Q given that the arbitrageur assumes the trader trades ■kt{xt~i, yt~i, <Pt-i) while the trader 
actually trades ut. Similarly, for each vrt and vt, define a dynamic programming operator G^* by 

{GllV){yt-i,4>t-i) = E:,l[\{ut + vt)yt-i + V{ytAt) I yt^iAt-i], 

for all functions V, where yt = yt-i + vt, ut = 7rt(xt-i, yt-i, (pt-i), xt^i is distributed according to the 
behef (pt-i, and cpt results from the Bayesian update © given that the arbitrageur correctly assumes the 
trader trades ut. 

Consider Algorithm [T] for computing a PBE. In Step[T] the algorithm begins by initializing the terminal 
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Algorithm 1 PBE Solver 

1: Initialize the terminal value functions U^_i and V^_^ according to ©-(O 
2: for t = T - l,r - 2, . . . ,1 do 

3: Compute (tTj , ) such that for all Xt^i, Ut-i, and (pt-i, 

T^*t{xt-i,yt^iAt-i) G argmax (Fif*''"* V;) yt-i, 

i'*Avt~\,<\>t^\) G argmax {clivf] {yt-\,4>t-\) 
4: Compute the value firngtions at the previous time'|t,^lj^;setJ;itiB, for all xt-x. Dm, and (\)t-\, 

5: end for 

value functions U^__i and V,^_^. These terminal value functions have a simple closed form in equilibrium. 
This is because, at time T, the trader must liquidate his position, hence 7r^(xT-i, yr-i, 4'T-i) = —xt-i- 
Similarly, arbitrageur must liquidate his position over times T and T + 1. In equilibrium, he will do so 
optimally, thus his value function takes the form 

VT-i{yT-i,4>T-i) = max E [A(-xt-i + VT)yT~i - Kvt-i + vrf I yr-i, (/t-iI 

(5) 

= -A (^T-i + f yr-i) yr-u 

where the optimizing decision is V'7^(yT-i 5 </'T-i) = —\yT-i- It is straightforward to derive the correspond- 
ing expression of the trader's value function, 

U^_l{xT-l-,yT-l,(l>T-l) = E [A i-XT-l - \yT-l) XT-1 I XT-1, yT-1, (pT-l] 

(6) 

= -A {XT-I + lyT-l) XT-1- 

At each time t < T, equilibrium policies must satisfy the best-response conditions ©-(lU). Given the 
value functions and V^*, these conditions decompose recursively according to to StepO Given such a 
pair (vr^, ip^), the value functions U*_i and V^*_i for the prior time period are, in turn, computed in Step ID 

It is easy to see that, so long as Step|3]is carried out successfully each time it is invoked, the algorithm 
produces a PBE (vr* , 0* ) along with value functions = andVj* = . However, the algorithm 

is not implementable. For starters, the functions 71^,1!)^, i , and i , which must be computed and stored, 
have infinite domains. 
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3.2. Linear Policies 

Consider the following class of policies: 

Definition 3. A function ttj is linear if there are coefficients a^'^\ ^yV^ ^^^^ "^/iV^' which are functions of 
Pt~i, such that 

(7) 7rt{xt-i,yt-i,4't-i) = +a(J*j~'yt_i +a^*^"Vt-i, 

for all xt~i, yt-i, and (l)t-i- Similarly, function il^t is linear if there is a coefficients by^^^ and b'^^J^, which 
is a function of pt-i, such that 

(8) Myt-u<Pt-i) = b^y^yt-i + CVi-i' 

for all yt-i and (/>t-i. A policy is linear if the component functions associated with times 1, . . . , T — 1 are 
linear. 

By restricting attention to linear policies and Gaussian beliefs, we can apply an algorithm similar to that 
presented in the previous section to compute a Gaussian PBE. In particular, consider Algorithm [2l This 
algorithm aims to computes a single-stage equilibrium that is linear. Further, actions and values are only 
computed and stored for elements of the domain for which 4>t~i is Gaussian. This is only viable if the 
iterates and V^, which are computed only for Gaussian (pt, provide sufficient information for subsequent 
computations. This is indeed the case, as a consequence of the following result. 

Theorem 1. If the belief distribution 4>t-i at time is Gaussian, and the arbitrageur assumes that the trader's 
pohcy TTt is hnear with Trt{xt-i,yt-i, ipt-i) = o^';^ ^xt-i+ cQ^t ^ _ i + a^'^ Vt- 1 > then the behef distribution 
(j)t is also Gaussian. The mean pt is a linear function of yt-i, Ht-i, and the observed price change Apf, 
with coefficients that are deterministic functions of the scaled variance pt-i- The scaled variance pt evolves 
according to 

(9) P? = (l + a^r)'(^ + Kr)^)''- 

\Pt-i J 

In particular, pt is a deterministic function of pt-\- 
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It follows from this result that if it* is linear then, for Gaussian (j)t-i, FuJ ' only depends on values 
of Uf evaluated at Gaussian (j)t. Similarly, if vr* is linear then, for Gaussian (pt-i, G^^ only depends 
on values of V* evaluated at Gaussian 0^. It also follows from this theorem that Algorithm |2j which only 
computes actions and values for Gaussian beliefs, results in a Gaussian PBE (vr*, ■0*)- We should mention, 
though, that Algorithm[2]is still not implementable since the restricted domains of U* and V^* remain infinite. 

Algorithm 2 Linear-Gaussian PBE Solver 

1: Initialize the terminal value functions U^_^ and V^_^ according to ©-(O 
2: for t = T - l,r - 2, . . . ,1 do 

3: Compute linear (vtj , ) such that for all xt^i, yt^i, and Gaussian (pt-i, 

T^Kxt-i,yt-i,(t>t-i) G argmax (Fif*'''*V;) yt-i, 

Ut ^ ' 

-iptiyt-i,'Pt~i) e argmax (G^*^/) {yt-i, (l)t-i) 

Vt ^ ' 

4: Compute the value functions at the previous time step by setting, for all xt-\, yt-i, and Gaussian cfit-i, 

UU{xt^i,yt-iAt-i) ^ (4f '"'V;) {xt-i,yt-iAt-i) 
VU{yt-iAt-i) - (Cgy;) {yt-iAt-i) 

5: end for 



Motivated by these observations, for the remainder of the paper, we will focus on computing equilibria 
of the following form: 

Definition 4. A pair of policies (vr*, -0*) € H x ^' is a linear-Gaussian perfect Bayesian equilibrium if it is a 
Gaussian PBE and each policy is linear. 

3.3. Quadratic Value Functions 

Closely associated with linear policies are the following class of value functions: 

Definition 5. A function Ut is trader-quadratic-decomposable (TQD) if there are coefficients c^^ ^, c^^ ^, 
cj^'/.,f cSy,f Cy^,,t and cg*^, which are functions of pt, such that 

(10) 

Ut{xt, yt, (At) = -a( 14'^^txj + ^c^ltyt + ic^*/.,t/^? + 4y,txtyt + c^jf,,txtf^t + 4'^,tym - ) > 
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for all xt, yt, and (j)t. A function Vt as arbitrageur-quadratic-decomposable (AQD) if there are coefficients 
^yy,t' '^'t^ti,t' ^y/i,t ^Q^t' which are functions of pt, such that 

(1 1) Vtiyt, 0i) = -A (^^d'yltyt + + ^lUvm - ^<t) , 

for all yt and 4>t. 

In equilibrium, U^_^ and V^_i are given by Step [T] of Algorithm [2l and hence are TQD/AQD. The 
following theorem captures how TQD and AQD structure preserved in the dynamic programming recursion 
given linear policies. 

Theorem 2. If [7^* is TQD and is AQD, and Step [3] of Algorithm [2] produces a linear pair (vrj", V't ), then 
C/f*_^ and V^_^^, defined by StepHlof Algorithm [2] are TQD and AQD, respectively. 

Hence, each pair of value functions generated by Algorithm|2]is TQD/AQD. A great benefit of this property 
comes from the fact that, for a fixed value of pt, each associated value function can be encoded using just a 
few parameters. 

3.4. Simplified Conditions for Equilibrium 

Algorithm |2] relies for each t on existence of a pair (ttj , ipl ) of linear functions that satisfy single-stage 
equilibrium conditions. In general, this would require verifying that each policy function is the Gaussian 
best response for all possible states. The following theorem provides a much simpler set of conditions. In 
Sectional we will exploit these conditions in order to compute equilibrium poUcies. 

Theorem 3. Suppose that and V* and TQD/AQD value functions specified by (ITOll-dm). and (7r*,V'*) 
are linear pohcies specified by O-®. Assume that, for all pt-i, the pohcy coefficients satisfy the first order 
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conditions 

"ill/ f "1/1/ * 



(12) 



(13) 
(14) 
(15) 



-1 

2 ' 



and the second order conditions 

(16) 4%,t + («* + + > 0' > 0' 

where the quantities at and /9t satisfy 

Then, (tt^, V't ) satisfy the single-stage equihbrium conditions 

T^tixt-i,yt-i,(f>t-i) e argmax (F^f* '''*^C/* j (xt_i, yt_i, ^t_i), 

Ut ^ ' 

iptiVt-iAt-i) e argmax [GyivA {yt-i,(j)t-i), 

for all xt-\, yt-i, and Gaussian <pt-i- 

Note that, while this theorem provides sufficient conditions for linear policies satisfying equilibrium 
conditions, it does not guarantee the existence or uniqueness of such pohcies. These remain an open issues. 
However, we support the plausibiUty of existence through the following result on Gaussian best responses to 
Unear policies. It asserts that, if and -kt are linear, then there is a Unear best-response tt^ for the trader in 
the single-stage game. Similarly, if ttj is linear then there is a linear best-response -f/'t for the arbitrageur in 
the single-stage game. 
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Theorem 4. If Ut is TQD, V't is linear, and vr^ is linear, then there exists a linear Tit such that 

Tit{xt-i,yt-iAt-i) G argmax (i^if*'''*^t/t) {xt-i,yt-i,(l}t-\), 

for all xt-\, Ut-i, and Gaussian so long as the optimization problem is bounded. Similarly, if Vt is 
AQD and vr^ is linear then there exists a linear tpt such that 

ipt{yt-i,(t)t-i) e argmax 

Vt 

for all yt-i and Gaussian (pt-i, so long as the optimization problem is bounded. 

Based on these results, if the trader (arbitrageur) assumes that the arbitrageur (trader) uses a linear pohcy then 
it suffices for the trader (arbitrageur) to restrict himself to linear policies. Though not a proof of existence, 
this observation that the set of linear policies is closed under the operation of best response motivates an aim 
to compute linear-Gaussian PBE. 

4. Algorithm 

The previous section presented abstract algorithms and results that lay the groundwork for the development 
of a practical algorithm which we will present in this section. We begin by discussing a parsimonious 
representation of pohcies. 

4. 1 . Representation of Policies 

Algorithm [2] takes as input three values that parameterize our model: (A, fXe, T). The algorithm output can 
be encoded in terms of coefficients {a'^^'t^ , cty^t^ , a'^t^ , by*t^ , b'^^^}, for every pt-i > and each time 
step^ t = 1, . . . ,T — 1. These coefficients parameterize linear-Gaussian PBE policies. Note that the output 
depends on A and ae only through pt. Hence, given any A and cje with the same pt, the algorithm obtains 
the same coefficients. This means that the algorithm need only be executed once to obtain solutions for all 
choices of A and a^- 

'Recall, from the discussion in Section [m that a^^^' = -1, aj^'^' = a^^j,' = 0, b^T^ = -1/2, b^*r+i = b^r' = 0, and 

K*,T+1 = ~1' fo'' pt-1- 
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Now, for each t, the pohcy coefficients are deterministic functions of pt-i- For a fixed value of pt-i, 
the coefficients can be stored as five numerical values. However, it is not feasible to simultaneously store 
coefficients associated with all possible values of pt-i- Fortunately, given a linear policy for the trader, 
Theorem [T] establishes pt is a deterministic function of pt-i- Thus, the initial value po determines all 
subsequent values of pt. It follows that, for a fixed value of po> over the relevant portion of its domain, a 
linear-Gaussian PBE can be encoded in terms of 5(T — 1) numerical values. We will design an algorithm 
that aims to compute these 5(T — 1) parameters, which we will denote by {0^,4, ay,t, a^^j, for 
t = 1, . . . , T — 1. These parameters allow us to determine PBE actions at all visited states, so long as the 
initial value of po is fixed. 

4.2. Searching for Equilibrium Variances 

The parameters {a^^t, cty^t, by^t, ^,t} characterize linear-Gaussian PBE policies restricted to the sequence 
po, • • • , Pt-1 generated in the linear-Gaussian PBE. We do not know in advance what this sequence will be, 
and as such, we seek simultaneously compute this sequence alongside t he policy pa ram eters. 



One way to proceed, reminiscent of the bisection method employed by lKyle 



(119851 ) and lFoster and Viswanathan 



(119941) would be to conjecture a value for pr-i- Given a candidate value pr-i, the preceding values 
Pt-2, ■ ■ ■ 1 Po, along with policy parameters for times T — 1, . . . , 1, can be computed by sequentially solv- 
ing the equations (fT2]) - (fT7]) for single-stage equilibria. The resulting policies form a linear-Gaussian PBE, 
restricted to the sequence po, • • • , Pt-i that they would generate if po = po- One can then seek a value of 
Pt-1 such that the resulting po is indeed equal to po. This can be accomplished, for example, via bisection 
search. 

The bisection method can be numerically unstable, however. This is because, the belief update equation 
(121) is used to sequentially compute the values pt-2, ■ ■ ■ , Po backwards in time. When the target value of po 
is very large, small changes in px-i can result in very large changes in po, making it difficult to match the 
precisely value of po. 

To avoid this numerical instability, consider Algorithm [3l This algorithm maintains a guess vr of the 
equilibrium policy of the trader, and, along with the initial value po, this is used to generate the sequence 
pi , . . . , pt-1 by applying the beUef update equation ^ forward in time. This sequence of values is then used 
in the single-stage equilibrium conditions to solve for policies (vr*, i/;*). A sequence of values pi, . . . , px-i 
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is then computed forward in time using the policy vr*. If this sequence matches the sequence generated by 
the guess vr, then the algorithm has converged. Otherwise, the algorithm is repeated with a new guess policy 
that is a convex combination of tt and vr*. Since this algorithm only ever applies the belief equation (O 
forward in time, it does not suffer from the numerical instabilities of the bisection method. 

Note that Step[6]of the algorithm treats pt-i as a free variable that is solved alongside the policy parameters 
{<ix,t, CLy^t, «/i,t5 by^t, b^^t}- These variables are computed by simultaneously solving the system of equations 
(fT2l) - (fT7]) for single-stage equilibrium. To be precise, Ux^t is obtained by solving the cubic polynomial 
equation (fT^ numerically. Given a value for ax,t, the remaining parameters {ay^t,o,^^t,by^t,b^^t} are be 
obtained by solving the linear system of equations ([T3])-(fT5]|. while pt-i is obtained through ([TT] ) . It can 
then be verified that the second order condition (fT6]) holds. Algorithm [3] is implementable and we use it in 
computational studies presented in the next section. 



Algorithm 3 Linear-Gaussian PBE Solver with Variance Search 
1: Initialize vr to an equipartitioning policy 
2: for /c = 1, 2, . . . do 

3: Compute pi, . . . , pt-1 according to the initial value po and the policy vr by (O 
4: Initialize the terminal value functions U^_i and V^_^ according to ©-([S]) 
5: fort = T-l,r-2,...,ldo 

6: Compute linear (vr^ , tp^ ) and pt-i solving the single-stage equilibrium conditions ([T2l)-(fr7]). assuming 
that pt = pt 

7: Compute the value functions U^_i and V^_i at the previous time step given (vr^, ■0^) 
8: end for 

9: Compute pi, . . . , px-i according to the initial value po and the policy vr* by @ 
10: if p = p then 
11: return 
12: else 

13: Set vr 7fcvr + (1 — '^k)'^* , where 7^ G [0, 1) is a step-size 
14: end if 
15: end for 



5. Computational Results 

In this section, we present computational results generated using Algorithm |21 In Section ISTTl we introduce 
some alternative, intuitive policies which will serve as a basis of comparison to the linear-Gaussian PBE 
policy. In Section [S!2l we discuss the importance of the parameter po — ^o"o/o"e in the qualitative behavior 
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of the Gaussian PBE policy and interpret po as a measure of the "relative volume" of the trader's activity 
in the marketplace. In Section [531 we discuss the relative performance of the policies from the perspective 
of the execution cost of the trader. Here, we demonstrate experimentally that the Gaussian PBE policy can 
offer substantial benefits. In Section \5A[ we examine the signaling that occurs through price movements. 
Finally, in Section [531 we highlight the fact that the PBE policy is adaptive and dynamic, and seeks to exploit 
exogenous market fluctuations in order to minimize execution costs. 

5.1. Alternative Policies 

In order to understand the behavior of linear-Gaussian PBE policies, we first define two alternative policies 
for the trader for the purpose of comparison. In the absence of an arbitrageur, it is optimal for the trader 
to minimize execution costs by partitioning his positio n into T equally s i zed b locks and liquidating them 



sequentially over the T time periods, as established by iBertsimas and Lol (|l998l) . We refer to the resulting 
policy vr^Q as an equipartitioning policy. It is defined by 

for all t, xt-i, yt-i, and (j)t-i. 

Alternatively, the trader may wish to liquidate his position in a way so as to reveal as little information 
as possible to the arbitrageur Trading during the final two time periods T — 1 and T does not reveal 
information to the arbitrageur in a fashion that can be exploited. This is because, as discussed in Section IXTl 
the arbitrageur's optimal trades at time T and T + 1 are = — yT-i/2 and vt+i = —yr, respectively, and 
these are independent of any belief of the arbitrageur with respect to the trader's position. Given that the 
trader is free to trade over these two time periods without any information leakage, it is natural to minimize 
execution cost by equipartitioning over these two time periods. Hence, define the minimum revelation policy 
^MR ^ policy that liquidates the trader's position evenly across only the last two time periods. That is. 



ift<r-l, 

-^xt-i ift = r-i, 

-xt-i if t = T, 



7rf"^(xt_i,yj_i,(/)t_i) 
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for all t, xt-i, yt-i, and cpt-i- 
5.2. Relative Volume 

Observed in Section |4~T1 linear-Gaussian PBE policies are determined as a function of the composite param- 
eter pq = Xao/a^. In order to interpret this parameter, consider the dynamics of price changes, 

Apt = X{ut + vt) + et, et^N{0,af). 

Here, et is interpreted as the exogenous, random component of price changes. Alternatively, one can imagine 
the random component of price changes are arising from the price impact of "noise traders". Denote by zt 
the total order flow from noise traders at time t, and consider a model where 

Apt = \{ut + vt + zt), zt^N{0,a'i). 

If (Te = Xaz, these two models are equivalent. In that case, 

In other words, pQ can be interpreted as the ratio of the uncertainty of the total volume of the trader's activity 
to the per period volume of noise trading. As such, we refer to po as the relative volume. 

We shall see in the following sections that, qualitatively, the performance and behavior of Gaussian PBE 
policies are determined by the magnitude of po- In the high relative volume regime, when po is large, either 
the initial position uncertainty do is very large or the volatility az of the noise traders is very small. In these 
cases, from the perspective of the arbitrageur, the trader's activity contributes a significant informative signal 
which can be decoded in the context of less significant exogenous random noise. Hence, the trader's activity 
early in the time horizon reveals significant information which can be exploited by the arbitrageur. Thus, it 
may be better for the trader to defer his liquidation until the end of the time horizon. 

Alternatively, in the low relative regime, when po is small, the arbitrageur cannot effectively distinguish 
the activity of the trader from the noise traders in the market. Hence, the trader is free to distribute his trades 
across the time horizon so as to minimize market impact, without fear of front-running by the arbitrageur. 
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5.3. Policy Performance 

Consider a pair of policies {iTjip), and assume that the arbitrageur begins with a position yo = and an 
initial belief 0o = A^(0,(Tq). Given an initial position xq, the trader's expected profit is Uq'^ {xo,0, (po). 
One might imagine, however, that the initial position xq represents one of many different trials where the 
trader liquidates positions. It makes sense for this distribution of xq over trials to be consistent with the 
arbitrageurs belief 0o, since this belief could be based on past trials. Given this distribution, averaging 



over trials results in expected profit E[[/q ''^(xq, 0, 
position immediately, the expected profit becomes E[— Axq 



] . Alternatively, if the trader liquidates his entire 
= —Xun. We define the trader's normalized 



expected profit U{7t, ifj) to be the ratio of these two quantities. When the trader's value function is TQD, this 
takes the form 

E C/g''^(a;o,0,(/)o) 00 i i 

U{^^^) = ^ T-2 ^ = -A + -2C'0% 

A(Tg I Pq 

where g and Cg°o are the trader's appropriate value function coefficients at time t = 0. 



Analogously, the arbitrageur's normahzed expected profit V{t:, is defined to be the expected profit of 
the arbitrageur normalized by the expected immediate liquidating cost of the trader. When the arbitrageur's 
value function is AQD, this takes the form 



y(7r,V) 



yJ^'"(xo,o,(/>o) 



00 



J2 "0,0' 



Now, let (vr*, V'*) denote a linear-Gaussian PBE. Since the corresponding value functions are TQD/ AQD, 
the normalized expected profits depend on the parameters {aQ,\,ae} only through the relative volume 
parameter = Xcfq/cf^. 

Similarly, given the equipartitioning policy vr^^, define ifj^^ to be the optimal response of the arbitrageur 
to the trader's policy vr^^. This best response policy can be computed by solving the linear-quadratic control 
problem corresponding to @, via dynamic programming. The policy takes the form 



-1 



(r-f)(T-t+3) -r 1 ^ J. ^ rp 

(T+l-t)(T+2-i)^*-l II i ^ I ^ , 



r+2-t2^*~l 2(T+l-t)(T+2-i)' 

—yx otherwise. 
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Using a similar argument as above, it is easy to see that [/(vr^^, -0^^) and V{tt^^, are also functions 
of the parameter po- 

Finally, given the minimum revelation policy vr'^'^ , define Tp^^ to be the optimal response of the arbitrageur 
to the trader's policy vr'^^. It can be shown that, when yo = and jj^q = 0, the best response of the arbitrageur 
to the minimum revelation policy is to do nothing-since no information is revealed by the trader in a useful 
fashion, there is no opportunity to front-run. Hence, 



MR ;MR^ _ ^ [ l^^O 4^^0 I 4>(}\ _ _3 

4' 



y(^MR^^MR)=0. 



In Figure \T\ the normalized expected profits of various policies are plotted as functions of the relative 
volume po, for a time horizon T = 20. In all scenarios, as one might expect, the trader's profit is negative 
while the arbitrageur's profit is positive. In all cases, the trader's profit under the Gaussian PBE policy 
dominates that under either the equipartitioning policy or the minimum revelation policy. This difference is 
significant in moderate to high relative volume regimes. 
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Figure 1: The normalized expected profit of trading strategies for the time horizon T = 20. 
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In the high relative volume regime, the equipartitioning pohcy fares particularly badly from the perspective 
of the trader, performing up to a factor of 2 worse than the Gaussian PBE policy. This effect becomes 
more pronounced over longer time horizons. The minimum revelation policy performs about as well as 
the PBE policy. Asymptotically as po T c<3» these policies offer equivalent performance in the sense that 

On the other hand, in the low relative volume regime, the equipartitioning policy and the PBE policy 
perform comparably. Indeed, define tp^ by = for all t (that is, no trading by the arbitrageur). In the 
absence of an arbitrageur, equipartitioning is the optimal policy for the trader, and backward recursion can 
be used to show that 

V > V- ; 2T 2 

Asymptotically as /9o i 0,U{tt^^,^^^) i and [/(vr*, ?/;*) j U{tt^^,^^). Thus, when the relative 

volume is low, the effect of the arbitrageur becomes negligible when po is sufficiently small. 

From the perspective of the arbitrageur in equilibrium, 1/(7r*, ■0*) ^ as p ^ ±00. In the low relative 
volume regime, the arbitrageur cannot distinguish the past activity of the trader from noise, and hence is not 
able to profitably predict and exploit the trader's future activity. In the high relative volume regime, as we 
shall see in Section l531 the trader conceals his position from the arbitrageur by deferring trading until the end 
of the horizon. Here, as with the minimum revelation policy, the arbitrageur is not able to profitably exploit 
the trader. Since the arbitrageur can choose not to trade at each period, his best response to any trading 
strategy should lead to non-negative expected profit. In light of these observations, we can easily infer that 
in equilibrium the arbitrageur's profit curve should have at least one local maximum. 

Both the equipartitioning and minimum revelation policies trade at a constant rate, but over different, 
extremal time intervals: the equipartitioning policy uses the entire time horizon, while the minimum revelation 
policy uses only the last two time periods. A fairer benchmark policy might consider optimizing the choice 
of time interval. Define the variable time policy vr^^ as follows: given the value po> select the r such that 
trading at a constant rate ut = over the last r time periods results in the highest expected profit for 
the trader, assuming that the arbitrageur uses a best response policy. Define i/"^^ to be the best response 
of the arbitrageur to vr^^. The variable time policy partially accounts for the presence of an arbitrary, and 
the expected profit with the variable time strategy will always be better that of equipartitioning or minimum 
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revelation. This is demonstrated by the U{7r^'^ , "0^^) curve in Figure [T] However, the trader still fares better 
with an equilibrium policy, particularly in the intermediate relative volume range, where the difference is 
close to 20%.2 

Examining Figure \T\ it is clear that, in equilibrium, the sum of the normalized profits of the trader and 
the arbitrageur is negative, and the magnitude of sum is larger than the magnitude of the loss incurred by the 
trader in the absence of the arbitrageur. Define the spill-over to be the quantity 

t/(7rEQ,^0)-(C7(7r*,r) + nvr*,r)). 

This is the difference between the normalized expected profit of the trader in the absence of the arbitrageur, 
under the optimal equipartitioning policy, and the combined normalized expected profits of the trader and 
arbitrageur in equilibrium. The spill-over measures the benefit of the arbitrageur's presence to the other 
participants of the system. Note that this benefit is positive, and it is most significant in the high relative 
volume regime. 

0.3 I 1 — I — I — I — I I 1 1 1 — I — I — I — I I 1 1 1 — I — I — I — I I 1 1 1 — I — I — I — I I 1 1 1 — I — I — I — III 
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0.2 - ^^-^^-^"^^ 
C7(7rEQ,^0) - ([/(vr*,^*) +y(7r*,0*)) 
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^^■"^10-2 10-1 10" IQi 10^ 10^ 

Po 

Figure 2: The spill-over of the system for the time horizon T = 20. 

In addition to the discussion of expected profits above, we can consider the variance of the trader's profits 
under different policies. Given a pair of policies {Tr,ip), define the trader's normalized variance of profit 
Var(/(7r,V') as the variance under the policies (vr, ^) relative to the variance of immediate liquidation. In 
other words. 



Var;7(7r,'0) = 





0o) 






Var ( —\xq + eixo 




2AV^ + a2a2 



^In practice, improvements of as low as 0.01% are considered significant. 
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where, as before, the expectations are taken assuming the policies (vr, ip) are used, yo = A'o = 0, and 
~ 00 = -^(0, Cq). Similarly, it is possible to see that, for a pair of linear policies (vr, V'), the trader's 
normalized variance of profit depends on the model parameters {do, A, a^} only through po- 

In Figure [3l the trader's normalized variance of profit is plotted under the different policies. The lowest 
variance occurs when the trader equipartitions and there is no arbitrageur, this is the curve Var;7(7r^^, ip^). 
When the arbitrageur is present, however, the variance in equilibrium Var{/(7r*, V'*) is less than either when 
the trader equipartitions (i.e., the curve Var(7(7r^^, V'^^)) or employs the minimum revelation policy (i.e., the 
curve Var [/ (vr'^^ , Figure|3]shows the entire cumulative distribution function of the trader's normalized 

profit under various relative volume regimes. Given the presence of the arbitrageur, the equilibrium policy 
has second-order dominance over equipartitioning in all relative volume regimes. 



20.0 




Figure 3: The trader's normalized variance of profit for the time horizon T = 20. 
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(a) The low relative volume regime, po = 1. 
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(b) The moderate relative volume regime, po ~ 10. 
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(c) The high relative volume regime, po = 100. 
Figure 4: The cumulative distribution of trader's normalized profit for the time horizon T = 20. 

5.4. Signaling 

An important aspect of the linear-Gaussian PBE policy is that it accounts for information conveyed through 
price movements. In order to understand this feature, define the relative uncertainty to be the standard 
deviation of the arbitrageur's belief about the trader's position at time t, relative to that of the belief at time 
0; i.e., the ratio at/cjQ. By considering the evolution of relative uncertainty over time for the Gaussian PBE 
policy versus the equipartitioning and minimum revelation policies, we can study the comparative signaling 
behavior. 
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Under any linear policy, the evolution of the relative uncertainty at / o-q over time is deterministic and 
depends only on the parameter po ■ This is because of the fact that at/ao = pt/ Po and the results in Section HTl 
In Figure [51 the evolution of the relative uncertainty of the PBE policy is illustrated, for different values of 
Pq, as compared to the equipartitioning and minimum revelation policies. In the low relative volume regime, 
the relative uncertainty of the PBE policy evolves similarly to that of the equipartitioning policy. In the high 
relative volume regime, very little information is revealed until close to the end of trading period under the 
PBE policy. Indeed, the relative uncertainty between the equilibrium and the minimum revelation policies 
are indistinguishable on the scale of Figure[5l when po = 10 or po = 100. These observations are consistent 
with our results from Section [531 
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Figure 5: The evolution of relative uncertainty of the trader's position for the time horizon T = 20. 



5.5. Adaptive Trading 



One important feature of the Unear-Gaussian PBE policy is that it is adaptive in the sense that the trades 
executed are random quantities that are dependent on the exogenous, stochastic fluctuations of the market. 
This is in contrast to the policies d eveloped in most of the o ptimal execution literature. For example, the 



baseline equipartitioning pohcy of 



Bertsimas and Lol ([ 19981) specifies a deter ministic sequence of trades . 



Static policies have also been derived under more comphcated models (e.g., 



Ahngren and Chriss . 



20001 : 
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Huberman and Stanzll . 120051 : IQbizhaeva and WangL 120051 : lAlfonsi et all. 12007m . However, this behavior is 
in contrast to what is observed amongst institutional traders and trading algorithms that are implemented 
by practitioners. One justification for adaptive, price-responsive trading strategies is risk ave rsion. It has 



been observed that optimal policies for certain risk averse objectives require dynamic trading (IHoraL 



ALmgren and LorenzL 



20061 : 



20061 ). Our model provides another justification: in the presence of asymmetric in- 
formation and a strategic adversary, a trader should seek to exploit price fluctuations so as disguise trading 
activity. 

In order to understand the behavior of linear policies, it is helpful to decompose them into deterministic 
and stochastic components. Suppose that (vr, Tp) are a pair of linear policies, and that yo = fJ-o = 0. Given 
Definition |3] and TheoremlH it is easy to see that, for each 1 < t <T, there exist vectors ae^t, f3e,t, 7e,t G IR* 
and scalars ax^^t, Pxo,t, lxo,t G each of which depend on the parameters {ao, A, cJe} only through the po, 
such that 



(18) 



Here, e* = (ei, . . . , et) is the vector of exogenous disturbances up to time t. The first terms in (fTS]) represent 
deterministic components of the policy and the second terms represent zero-mean stochastic components 
that depend on market price fluctuations. For the equipartitioning and minimum revelation policies, the 
stochastic components are zero. On the other hand, the Gaussian PBE policy does have non-zero stochastic 
components. 

Figure [6] shows the deterministic component of the linear-Gaussian PBE versus those of the equiparti- 
tioning and minimum revelation policies. As pQ —>■ 0, the trader ignores the presence of the arbitrageur and 
the PBE policy approaches the equipartitioning policy. At the other extreme, as pQ — oo, in equilibrium the 
trader seeks to conceal his activity as much as possible, and hence the PBE policy approaches the minimum 
revelation policy. 

Figure |7] illustrates sample paths of the trader's position under the linear-Gaussian PBE policy. Along 
each path, the trader deviates from the deterministic schedule based on the random fluctuations of the market 
and how they influence the arbitrageur's beliefs. In general, if the arbitrageur's estimate of the trader's 
position becomes more accurate, the trader accelerates his selling to avoid front-running. On the other hand. 
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if the arbitrageur is misled as to the trader's position, the trader delays his selling relative to deterministic 
schedule. 



6. Extensions 

In this section, we revisit some of the assumptions in the problem formulation of Section [2l At a high level, 
the main feature of our model that enables tractability is that, in equilibrium, each agent solves a linear- 
quadratic Gaussian control problem. This requires that the evolution of the model over time be described by 
a linear system and that the objectives of the trader and arbitrageur be quadratic functions that decompose 
additively over time. As we shall see shortly, there are a number of extensions of the model one may 
consider, incorporating important phenomena such as risk aversion and transient price impact, that maintain 
this structure. Such extensions remain tractable and can be addressed using straightforward adaptations of 
the techniques we have developed. 
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Figure 7: Sample paths of the evolution of the trader's actual and expected positions, and the arbitrageur's mean 
belief, when T = 20, xq ^ ao ^ 10^ = = 0, a, = 0.125, A = 10"^ 

6.1. Time Horizon 

Our model assumes that the trader begins his liquidation at time 1 and completes it by time T, and that this 
time interval is common knowledge. In some instances, public knowledge of the beginning and end of the 
liquidation interval might be reasonable since, for example, this interval will often correspond to a single 
trading day. More generally, however, it may be desirable to impose uncertainty on the part of arbitrageur as 
to the beginning and end of the liquidation. Unfortunately, it is not clear how to allow for this in a tractable 
fashion in our current framework. 

The model further assumes that the arbitrageur must liquidate his position by time T + 1. Then, the 
value function of the arbitrageur at time T with position yx, is given by V^{yT) = — Ayf^. This was used in 
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(HJl-® to determine the value functions U^_i and Vr^_ ^ , which form the base case of the backward induction. 
This assumption can easily be relaxed. For example, suppose that the arbitrageur has Ta additional trading 
periods. It is easy to see that, after time T, the arbitrageur will optimally equipartition over the remaining Ta 
periods. Ther efore the value of a positi on i/t at time T will take the form V"^(yT) = — following 



the analysis in 



Bertsimas and Lol (119981) . So long as is a quadratic function, our discussion in Sections [3] 



and|4]carries through, with a different choice of terminal value functions. 



6.2. Risk Aversion 



Our model assumes that both the trader and arbitrage ur are risk-neutral. One way to account for risk aversion 



is to follow the approach suggested by 



Horal (120061) . In particular, we could assume that, for example, the 



trader seeks to optimize the objective function 



T-l 



r=0 



The second term in the sum penalizes for variance in revenue in each time period, with 7] > capturing 
the degree of risk aversion. This final term represents a per stage holding cost, with the parameter C ^ 
expressing the degree to which the trader would prefer to execute sooner rather than later. The risk neutral 
case previously considered corresponds to the choice of = C = 0. For any nonnegative parameter choices, 
the objective remains a time separable positive definite quadratic function. Hence, the methods of Sections [3] 
and|4]can be suitably adapted. 



6.3. Price Impact & Price Dynamics 

Our model assumed permanent and linear price impact. Empirically, it has been observed that transient price 
impact is a significant component of price dynamics, and it is important to account for this in the design of 
execution strategies. 

More generally, our analysis applies when there is some collection of state variables (for example, 
{xt,yt, ^J-t}) that evolve as a linear dynamical system with Gaussian disturbances, and where changes in 
price are linear in the state variables. In order to incorporate transient price impact, assume that prices evolve 
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according to 

t t 

Pt = PO + X^iUr +Vr + Z^) + 7^a*~^(Mr + Vr + 
' r=l T=l 



permanent price impact transient price impact 

Here, Ur and v^- are the trades of the trader and arbitrageur, respectively, as time r. In place of the exogenous 
noise term in the original price dynamics ([Jl, Zr is an IID A^(0, cx^) random variable representing the quantity 
of noise trades at time r. The second term in ([T9l ) captures a permanent, linear price impact with sensitivity 
A > 0. The final term represents a transient, linear price impact with sensitivity 7 > and recovery rate 
a e [0,1). 

These price dynamics can be rewritten as 



Pt = pt-i + (A + 7) (lit + vt + zt) - 7(1 - a)st-i, 



where st is defined to be geometrically weighted total order flow 

t 

St = ^ a*~^(tir + Vr + Zr) = aSf-l + {ut + Vt + Zt). 

Now, suppose that the trader's decision ut is a linear function of {xt-i , yt-i, Mt-i , st-i }, and the arbitrageur's 
decision vt is a linear function of {yt-i, fit-i, st-i}. Then, it will be the case that {xt,yt, Att, s*} evolve as a 
linear dynamical system, and that the price changes are linear in these state variables. Therefore, the analysis 
in Sections |3] and |4] can be suitably modified and repeated, with an augmented state space. Note that, since 
St is a function of only of the total quantities traded at times up to t, it is reasonable to assume that this is 
public knowledge known to both the trader and arbitrageur. 

Other aspects of more complicated price dynamics can also be incorporated via such state augmentation. 
For example, one may consider linear factor models or other otherwise add exogenous explanatory variables 
to the evolution of prices, so long as the dependencies are linear. Similarly, models that incorporate drift in 
the price process, such as short term momentum or mean reversion, can be considered. 
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6.4. Parameterized Policies 

Beyond solving specific classes of models, results from the optimal execution literature offer useful guidance 
on how to structure parameterized execution policies that can be effective even if modeling assumptions are 
not entirely valid. In this vein, concepts we have developed can enhance parameterized policies that one 
might design based on prior literature. 

For example, consider designing an execution system which begins the trading day with a position 



that must be liquidated by the end of that trading day. A number of models 
litera t ure result in determin istic linear policies (see, e.g.. 



2005 



Alfonsi et al 



Bertsimas and Ld 



previously considered in the 



1998 



Qbizhaeva and WangL 



2007ar) . In particular, for each tth time period during the course of the day, there is 
a parameter at that indicates the fraction of the position to sell during that time period. These parameters 
ao, . . . , flT-i depend on asset-specific characteristics such as volatility and market impact model parameters. 

Modeling assumptions often do not match reality. As such, it is useful to add flexibility by parametrizing 
the execution poUcy. For example, we might employ a policy that sells a fraction Otat of the position during 
each tth time period, where 9o, . . . , 9t-i are asset-independent parameters. Then, these parameters can be 
tuned based on experience from trading all assets. It is important that the number of parameters does not 
scale with the number of assets, because we would then be unlikely to have a sufficient amount of data to 
tune parameters. In this regard, the way ao, . . . , ar-i capture variations across assets is critical to the design 
of an effective parametrization. 

Our work motivates a generalized class of parameterized policies that adapt trades as price move- 
ments are observed. Our model is optimized by an execution strategy with three sequences of coefficients: 
{ax,t, o.y,t, o,fj_^t \ t = 0, . . . , T — 1}. By simulating arbitrageur activity over the course of the day and applying 
these coefficients appropriately, we produce a sequence of trades that adapt to price fluctuations. Similarly 
with the case of a deterministic policy, we can introduce parameters {6x,t, Oy^t, 0^i,t |t = 0, ...,T — 1} that 
scale the policy coefficients, and tune these parameters based on experience. Once again, these parameters 
are asset-independent while the coefficients {a^^t, 0'y,t, o^t,t \ t = 0, . . . ,T — 1} capture dependence of the 
policy on asset-specific characteristics such as volatihty and market impact model parameters. 
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7. Conclusion 



Our model captures strategic interactions between a trader aiming to liquidate a position and an arbitrageur 
trying to detect and profit from the trader's activity. The algorithm we have developed computes Gaussian 
perfect Bayesian equilibrium behavior. It is interesting that the resulting trader policy takes on such a simple 
form: the number of shares to liquidate at time t is linear in the trader's position xt-i, the arbitrageur's 
position nt-i and the arbitrageur's estimate /xt„i of xt-i- The coefficients of the policy depend only on the 
relative volume parameter pQ, which quantifies the magnitude of the trader's position relative to the typical 
market activity, and the time horizon T. This policy offers useful guidance beyond what has been derived in 
models that do not account for arbitrageur behavior. In the absence of an arbitrageur, it is optimal to trade 
equal amounts over each time period, which corresponds to a policy that is linear in xt-i- The difference 
in the PBE policy stems from its accounting of the arbitrageur's inference process. In particular, the policy 
reduces information revealed to the arbitrageur by delaying trades and takes advantage of situations where 
the arbitrageur has been misled by unusual market activity. 

Our model represents a starting point for the study of game theoretic behavior in trade execution. It 
has an admittedly simple structure, and this allows for a tractable analysis that highlights the importance of 
information signaling. There are a number of extensions to this model that are possible, however, and that 
warrant further discussion: 

1. (Flexible Time Horizon) We assume finite time horizons T and T + 1 for the trader and arbitrageur, 
respectively. The choice of time horizon has an impact on the resulting equilibrium policies, and there 
are clearly end-of -horizon effects in the policies computed in Section [5] To some extent it seems 
artificial to impose a fixed time horizon as an exogenous restriction on behavior. Fixed horizon models 
preclude the trader from delaying liquidation beyond the horizon even if this can yield significant 
benefits, for example. A better model would be to consider an infinite horizon game, where risk 
aversion provides the motivation for liquidating a position sooner rather than later. 

2. (Uncertain Trader) In our model, we assume that the arbitrageur is uncertain of the trader's position, 
but that the trader knows everything. A more realistic model would allow for uncertainty on the part 
of the trader as well, and would allow for the arbitrageur to mislead the trader. 
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3. (Multi-player Games) Our model restricts to a single trader and arbitrageur. A natural extension would 
be to consider multiple traders and arbitrageurs that are uncertain about each others' positions and must 
compete in the marketplace as they unwind. Such a generalized model could be useful for analysis of 
important liquidity issues such as those arising from the credit crunch of 2007. 

Also of interest are the potential empirical implications of the model. If we make the assumption that the 
trade execution horizon is a single day, the observations in Section [5] suggest particular patterns for intraday 
volume. For example, if po is large, the volume traded should be much higher near the end of the day then 
at other times. Similarly, the structure of the equilibrium trading policies for the trader and arbitrageur will 
generate specific, time- varying auto-correlation in the increments of the price process. Formulating tests of 
such empirical predictions in any interesting area for future research. 

Finally, beyond the immediate context of our model, there are many directions worth exploring. One 
important avenue is to factor data beyond price into the execution strategy. For example, volume data may 
play a significant role in the arbitrageur's inference, in which case it should also influence execution decisions. 
Limit order book data may also be relevant. Developing tractable models that account for such data remains 
a challenge. One initiat ive to incorporate limit order book data into the decision process is presented by 



Nevmvvaka et al. 



torn . 
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A. Proofs 



Theorem [T] If the belief distribution (pt-i at time is Gaussian, and the arbitrageur assumes that the trader's 
pohcy TTt is Unear with 7rt{xt-i,yt-i, 't't-i) = a^'^ ^ ^t- 1 + (^ylt ^ yt- 1 + f^^I Wt- 1 ' then the behef distribution 
(pt is also Gaussian. The mean /xt is a linear function of yt-i, fJ-t-i, and the observed price change Apt, 
with coefficients that are deterministic functions of the scaled variance pt-i- The scaled variance pt evolves 
according to 



\Pt-i 



In particular, pt is a deterministic function of pt-i- 

Proof. Set {Kt-i, ht-i} to be the information form parameters for the Gaussian distribution 4>t-i, so that 

Kt-i = l/(Tt_i, and ht^i = pt-i/crt-i- 

Define 4'f_i to be the distribution of xt^i conditioned on all information seen by the arbitrageur at times up 
to and including t. That is, 

<Pt-ii^) - {^t-i G 5* I <pt-i,yt-i,Knixt-i,yt-i,(pt-i) + vt) + et = Apt), 
where Apt is the price change observed at time t. By Bayes' rule, this distribution has density 

(p^-iidx) oc (j)t-i{dx) exp 



{Apt - A(7ri(x,yt-i,0t-i) + ^tiyt~i, (pt-i)))^ 



oc exp -i Kt-i + 



2ai 



„2 



Thus, (pt-i ^ Gaussian distribution, with variance 



Kt-i + 
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and mean 



Kt-i + : ht-i + 



9 I \ — ^ ' 9 



Now, note that 



xt = xt-i + 7rt(xf_i,yt_i,(?:)t_i) = (1 + aJt + a^*^ ^yt-i + a|,*t /it_i. 



Then, (pt is also a Gaussian distribution, with variance 

(20) CTt = (1 + <,i ) i^t-i + -0 = (1 + Ox,*) ^— + ' 



9 I V-^ I ^X,IJ 19' 9 



and mean 



+ (1 + aTt 



The conclusions of the theorem immediately follow. ■ 

In order to prove Theorems |2]-0J it is necessary to explicitly evaluate the operator F^f*''^*'* applied to 
quadratic functions of {xt,yt, fJ-t} and the operator applied to quadratic functions of {yt,fit}- The 
following lemma is helpful for this purpose, as it provides expressions for the expectation of fit and fi^ under 
various distributions. 

Lemma 1. Assume that the the policies tpt and vrt are linear with 



Define 



1 fl''*-' 



i/pti+«r)^^ 
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Then, 

Eit"-') [fit I xt-,,yt^i,ct>t-i] = aiyyt-i + a^^.-Vt-i + Tf-Vt-i/pLi 

(22a) 

pt—ipt—i/ Pt—i pt—i \ 

+ It a^,t [ut - ay^t yt-i - a^ ^ fit-i) , 

(22b) Varit*'-*) [^^ | xt-i,yt-i, c^t-i] = {^^'^'a'J^^'ajxf , 

(22c) 2 

+ (Eit*'-*) [^i I xt-uyt-i,cl>t-i]) , 

(22d) E-; [/.t I yt-iAt-i] = 4Tyt-i + (1 + + 

(22e) Va<' [m, | = (Tf-^^^V./A)' (l + {a'Jj')%U) , 

(22f) Ell [m? I yt-i, 'At-i] = Var^; [fit \ Vt-i, ^t-i] + {E^: [f^t \ yt-i,^t-i])^ ■ 

Proof. The lemma follows directly from taking expectations of the mean update equation (|2TI ). ■ 

Theorem|2] If C/j* is TQD and V* is AQD, and Step[3]of Algorithm |2] produces a hnear pair {■Kt-.i't)' then 
Ut_i and Vt_i, defined by StepHof Algorithmic] are TQD and AQD, respectively. 

Proof. Suppose that 




T^t{xt-i,yt-i:(Pt-i) = a^t xt-i + ayt yt-i + a^^ fit-i, 

rtivt^iAt-i) = biyyt-i + b';yfit-i. 
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If the trader uses the policy ttj and the arbitrageur uses the poUcy tp*, we have 



yt = yt-i + b'yTyt-i + b'^^t'lh-i. 
Using these facts, Theorem [1] and (I22dl )- (l22fl ) from Lemma [T] we can explicitly compute 



where 



VtU{yt-i,(t^t-i) = (cgy) {yt-i,4>t-i) 



E * 



X{ut + vt)yt-i + V*{yt, (t)t) 



yt-i,(Pt-i 



\ I ^ ^ft-^ 2,1 jPt-l 2 , jpt-l '^e jpt-1 



\pt-l J 



yy,t 

"'yy,t 



"'yp,t 



^ l '^^'* " <y, J ^'^ 



V "to,* 



"■yy,t 



jpt-i _ jpt I 

"0,t-l ~ "0,i 



^Pt 



2 



PP.t / „Pt ^,Pi-l 



0"f 



y.t ^pt 
"'yy,t 



+ 2, 



(1 + <* + <*) 



Therefore, V^l^ is AQD. Similarly, we can check that U^_-^^ is TQD. 



Theorem[3l Suppose that Ut and V^* and TQD/ AQD value functions specified by (fTO)) - (fTTl) . and (vr^*, V't ) are 
linear policies specified by O-®. Assume that, for all pt-i, the pohcy coefficients satisfy the first order 
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conditions 



(23) 



(24) 



"'yy,t %y,t 



and the second order conditions 



(27) eg* t + (a* + 1)C , + , > 0, , > 



where the quantities at and satisfy 



Then, (tt^ , -04 ) satisfy the single-stage equilibrium conditions 



(29) 7r*(xt_i,yt_i,(/)t_i) G argmax ( F^f C/^* ) yt_i, c/)*-!), 



(30) Vt{yt-iAt-i) G argmax I G^,* F/j 

for all Xf_i, and Gaussian 

Proof. As we will discuss in the proof of Theorem |4l the optimizing value ul in (l29l ) is a linear function 
of and zt-i, whose coefficients depend on {ag'^\ a^'^"\ a^*^\ 6^*^^}. By equating the 

coefficients of {xt-i,yt-i, zt-i} with {ag*^\ a^*^"\ a|^*^^}, respectively, we can obtain (l23l) . (I24l ) andl25l 
(l26l ) can be derived by considering (l30l) in the same way. (l27l) corresponds to the second order conditions 
for the two maximization problems. ■ 
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TheoremlH If Ut is TQD, t(;t is linear, and vrt is linear, then there exists a linear irt such that 



TTt{xt-i,yt-i,(l)t-i) G argmax (-Fif*'''*^t/t) ixt_i,yt-i,(j)t-i), 

for all xt-\, Vt-i, and Gaussian so long as the optimization problem is bounded. Similarly, if Vt is 
AQD and -Kt is linear then there exists a linear ipt such that 

e argmax 

ft 

for all yt-i and Gaussian (pt-i, so long as the optimization problem is bounded. 
Proof. Suppose that 



2 

TTt{xt-i,yt-i,4>t~~i) = a^^\t^xt-i + a^lt^yt-i + a^'^Vi-i, 



If the trader takes the action ttt, while the arbitrageur uses the policy ip* and assumes that the trader uses the 
policy TTt, we have 

Xt = Xt-l + Ut, 

yt = yt-i + by^yt-i + 6^*7 Vt-i. 
Using these facts. Theorem [1] and (I22al )- (l22cl ) from Lemma[T] we can explicitly compute 

{F^t'^^'^Ut) (.xt-i,yt^iAt-i) = Elt*'"*^ [X{ut + vt)xt-i + Ut{xt,yt,ct>t) \ Xt-i,yt-i, (t>t-i] ■ 

It is easy to see that (^Pif^'^^^'Ut^ {xt-i,yt-i, (pt-i) is quadratic in ut. Moreover, the coefficient of uf is 
independent of {xt-i,yt-i, fJ-t-i} while the coefficient of ut is linear in {xt-i,yt-i, f^t-i}- Therefore, the 
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optimizing u* is a linear function of {xt-i, yt-i, fH-i}, whose coefficients can be computed by substitution 
and rearrangement of the resulting terms. 
Similarly, suppose that 

Vtivt, = -A (^^dpyityt + ^Ct/^? + d'yUtym - ^<t) , 

Trt{xt-i,yt-i,4>t-i) = a^J^^^xt-i +a''y^;^yt-i + a^^^^^ in-i- 
If the arbitrageur takes the action vt and assumes that the trader uses the policy vr^, we have 

yt = yt-i +vt. 

Using these facts, Theorem [H and (|22dl) - (|22fl) from Lemma [U we can explicitly compute 

{G^lVt) {yt-i, (/.t-i) = [A(7rt + vt)yt-i + Vt{yt, (j^t) \ yt-iAt-i] • 

It is easily checked that {pH Vt) {yt-i, (j^t-i) is quadratic in Vf. Moreover, the coefficient of is independent 
of {yt-i, fJ't-i} while the coefficient of vt is linear in {yt-i, ^J't-l}■ Therefore, the optimizing is a linear 
of {yt-i, /Uf_i}, whose coefficients can be computed by substitution and rearrangement of the resulting 
terms. ■ 
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